feat(dataframe): add writeJson with JsonWriteOptions#61
Merged
Conversation
Mirror writeParquet's surface for newline-delimited JSON. JsonWriteOptions exposes singleFileOutput, partitionCols, and fileCompressionType; the DataFusion-side JsonOptions only carries compression in writer mode (the read-side toggles like newline_delimited and schema_infer_max_rec do not apply here). JsonOptions has no fluent setters, so the native handler builds it via struct-update syntax (same idiom as ArrowReadOptions / AvroReadOptions). Option<JsonOptions> stays None when no writer-side knob is set, so DataFusion's runtime defaults are preserved when callers pass new JsonWriteOptions(). When the caller leaves singleFileOutput unset, default to directory output (with_single_file_output(false)) rather than DataFusion's Automatic mode. Automatic treats extension-bearing paths like "out.json" as single-file targets, which would silently contradict the documented "directory unless overridden" default.
Member
|
@LantaoJin could you fix conflict? Thanks |
…e-json # Conflicts: # core/src/main/java/org/apache/datafusion/DataFrame.java
Contributor
Author
Done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
DataFrame.writeParquetshipped in #27. JSON is the third writer DataFusion'sDataFrameAPI exposes natively (DataFrame::write_json) and is the easiest format to consume from non-Arrow downstream tooling. The implementation follows the same proto-over-JNI pattern as the merged readers, mirrors the writer-side shape we'd land for CSV (#38), and has zero binary-size impact -- DataFusion's JSON support is in the default feature set, no Cargo flag changes required.What changes are included in this PR?
proto/json_write_options.proto-- newJsonWriteOptionsProtomessageJsonWriteOptionsJava builderJava_org_apache_datafusion_DataFrame_writeJsonWithOptionsJNI handler innative/src/json.rsAre these changes tested?
Yes -- 9 new tests across
JsonWriteOptionsTestandDataFrameWriteJsonTest.Are there any user-facing changes?
Yes -- purely additive. New public API:
org.apache.datafusion.JsonWriteOptionsDataFrame.writeJson(String)DataFrame.writeJson(String, JsonWriteOptions)The new
org.apache.datafusion.protobuf.JsonWriteOptionsProtogenerated class is also exposed via the protobuf-Java output, consistent with howCsvReadOptionsProto,NdJsonReadOptionsProto, etc. are exposed. No API removals, no deprecations, no behavior change for existing callers. No Cargo feature changes; binary size is unchanged.